Bounds for Regret-Matching Algorithms
نویسندگان
چکیده
We introduce a general class of learning algorithms, regret-matching algorithms, and a regret-based framework for analyzing their performance in online decision problems. Our analytic framework is based on a set Φ of transformations over the set of actions. Specifically, we calculate a Φ-regret vector by comparing the average reward obtained by an agent over some finite sequence of rounds to the average reward that could have been obtained had the agent instead played each transformation φ ∈ Φ of its sequence of actions. The regret matching algorithms analyzed here select the agent’s next action based on the vector of Φ-regrets, along with a link function f . Many well-studied learning algorithms are seen to be instances of regret matching. We derive bounds on the regret experienced by (f,Φ)-regret matching algorithms for polynomial and exponential link functions (though we consider polynomial link functions for p > 1 rather than p ≥ 2). Although we do not improve upon the bounds reported in past work (except in special cases), our means of analysis is more general, in part because we do not rely directly on Taylor’s theorem. Hence, we can analyze algorithms based on a larger class of link functions, particularly non-differentiable link functions. In ongoing work, we are indeed studying regret matching with alternative link functions, other than polynomial and exponential.
منابع مشابه
Regret-Matching Bounds Bounds for Regret-Matching Algorithms
We study a general class of learning algorithms, which we call regret-matching algorithms, along with a general framework for analyzing their performance in online (sequential) decision problems (ODPs). In each round of an ODP, an agent chooses a probabilistic action and receives a reward. The particular reward function that applies at any given round is not revealed until after the agent acts....
متن کاملFair Algorithms for Infinite Contextual Bandits
We study fairness in infinite linear bandit problems. Starting from the notion of meritocratic fairness introduced in Joseph et al. [9], we expand their notion of fairness for infinite action spaces and provide an algorithm that obtains a sublinear but instance-dependent regret guarantee. We then show that this instance dependence is a necessary cost of our fairness definition with a matching l...
متن کاملNo-regret algorithms for structured prediction problems—DRAFT
No-regret algorithms are a popular class of learning rules which map a sequence of input vectors x1, x2 . . . to a sequence of predictions y1, y2, . . .. Unfortunately, most no-regret algorithms assume that the predictions yt are chosen from a small, discrete set. We consider instead prediction problems where yt has internal structure: yt might be a strategy in a game like poker, or a configura...
متن کاملRegret bounds for Non Convex Quadratic Losses Online Learning over Reproducing Kernel Hilbert Spaces
We present several online algorithms with dimension-free regret bounds for general nonconvex quadratic losses by viewing them as functions in Reproducing Hilbert Kernel Spaces. In our work we adapt the Online Gradient Descent, Follow the Regularized Leader and the Conditional Gradient method meta algorithms for RKHS spaces and provide regret bounds in this setting. By analyzing them as algorith...
متن کاملNo-regret algorithms for Online Convex Programs
Online convex programming has recently emerged as a powerful primitive for designing machine learning algorithms. For example, OCP can be used for learning a linear classifier, dynamically rebalancing a binary search tree, finding the shortest path in a graph with unknown edge lengths, solving a structured classification problem, or finding a good strategy in an extensive-form game. Several res...
متن کامل